AITopics | scsg method

Collaborating Authors

scsg method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing SystemsNov-21-2025, 15:43:50 GMT

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $E \|\nabla f(x)\|^{2}\le \epsilon$ is $O(\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\})$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.

electronic proceedings, name change, scsg method

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.08)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Reviews: Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing SystemsOct-8-2024, 02:22:21 GMT

Summary: This paper proposes a variant of a family of stochastic optimization algorithms SCSG (closely related to SVRG), and analyzes it in the context of non-convex optimization. The main difference is that the outer loop computes a gradient on a random subset of the data, and the stochastic gradients in the inner loop have a random cardinality and are not restricted to those participating in the outer gradient. The analysis of the stochastic gradients despite this mismatch is the main technical contribution of the paper. The result is a set of new bounds on the performance of the algorithm which improve on both sgd and variance reduced algorithms, at least in the low-medium accuracy regimes. Experimental evidence supports the claims of improvement in objective reduction (both training error and test error) per iteration both before the end of the first pass on data and after, both by SCSG over SGD and by varying batch size variant over the fixed size variant, but only on two networks applied to the venerable and small MNIST dataset. Pros of acceptance: - The claimed results are interesting and shed light on an important regime.

algorithm, contribution, gradient, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)

Add feedback

Non-convex Finite-Sum Optimization Via SCSG Methods

Lei, Lihua, Ju, Cheng, Chen, Jianbo, Jordan, Michael I.

Neural Information Processing SystemsFeb-14-2020, 10:14:04 GMT

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $E \ abla f(x)\ {2}\le \epsilon$ is $O(\min\{\epsilon {-5/3}, \epsilon {-1}n {2/3}\})$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.

scsg method

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback